Python for Data Science For Dummies by Luca Massaron & John Paul Mueller

Python for Data Science For Dummies by Luca Massaron & John Paul Mueller

Author:Luca Massaron & John Paul Mueller [Luca Massaron]
Language: eng
Format: epub
Tags: Python
Publisher: For Dummies
Published: 2015-07-06T21:00:00+00:00


Stretching Python’s Capabilities

In This Chapter

Understanding how Scikit-learn works with classes

Using sparse matrices and the hashing trick

Testing performances and memory consumption

Saving time with multicore algorithms

If you’ve gone through the previous chapters, by this point you’ve dealt with all the basic data loading and manipulation methods offered by Python. Now it’s time to start using some more complex instruments for data wrangling (or munging) and for machine learning. The final step of most data science projects is to build a data tool able to automatically summarize, predict, and recommend directly from your data.

Before taking that final step, you still have to massage your data by enforcing transformations that are even more radical. That’s the data wrangling or data munging part, where sophisticated transformations are followed by visual and statistical explorations, and then again by further transformations. In the following sections, you learn how to handle huge streams of text, explore the basic characteristics of a dataset, optimize the speed of your experiments, compress data and create new synthetic features, generate new groups and classifications, and detect unexpected or exceptional cases that may cause your project to go wrong.

From here onward, you use the Scikit-learn package more and more (which means knowing more about it — the full documentation appears at http://scikit-learn.org/stable/documentation.html). The Scikit-learn package, in fact, offers a single repository containing almost all the tools that you need to be a data scientist and for your data science project to be successful. In this chapter, you discover important characteristics of Scikit-learn, structured in modules, classes, and functions, and some advanced Python time savers for improving performance with big unstructured data and highly time-consuming computational operations.

You don’t have to type the source code for this chapter in by hand. In fact, it’s a lot easier if you use the downloadable source (see the Introduction for download instructions). The source code for this chapter appears in the P4DS4D; 12; Stretching Pythons Capabilities.ipynb source code file.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.